{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Vector Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install langchain-google-vertexai langchain-google-community"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `Embedding` class provides a standardized `Runnable` for embedding models.\r\n",
    "The key methods of this class are` embed_quer`y and` embed_document`sts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_vertexai import VertexAIEmbeddings\n",
    "\n",
    "model_name = \"<embedding_model_name>\"\n",
    "embedding_model = VertexAIEmbeddings(model_name=model_name)\n",
    "\n",
    "single_embedding = embedding_model.embed_query(\"User query\")\n",
    "multiple_embeddings = embedding_model.embed_documents([\n",
    "\t\"Sample text 1\",\n",
    "\t\"Sample text 2\",\n",
    "\t\"Sample text 3\",\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The argument `model_name` specifies the version of the VertexAI model to use.\n",
    "\n",
    "For the most up-to-date information on available versions and their capabilities, please\n",
    "refer to the official Vertex AI embedding model [documentation page](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## VectorStores"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `VectorStore class` abstracts the entire vector search process at query time,\r\n",
    "providing a` similarity_searc`h method that accepts a query and returns a list of\r\n",
    "the most similar documents in the index\n",
    "\n",
    ". Optionally, you can add the paramete`r` k to\r\n",
    "specify how many similar documents you wish to retrieve."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_vertexai.vectorstores import VectorSearchVectorStore\n",
    "\n",
    "embeddings = VertexAIEmbeddings(model_name=\"textembedding-gecko-default\")\n",
    "\n",
    "vector_store = VectorSearchVectorStore.from_components(\n",
    "        project_id=os.environ[\"PROJECT_ID\"],\n",
    "        region=os.environ[\"REGION\"],\n",
    "        gcs_bucket_name=os.environ[\"GCS_BUCKET_NAME\"],\n",
    "        index_id=os.environ[\"INDEX_ID\"],\n",
    "        endpoint_id=os.environ[\"ENDPOINT_ID\"],\n",
    "        embedding=embeddings\n",
    ")\n",
    "\n",
    "documents = vector_store.similarity_search(\"user_query\", k=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`VectorStore` instances can be converted into a standard retriever by invoking the\r",
    "`\n",
    "as_retrieve`r method, enabling the use of the basic retriever interface."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = vector_store.as_retriever()\n",
    "documents = retriever.get_relevant_documents(\"user_query\", k=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## VertexVectorSearch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "LangChain offers two distinct `VectorStores` for integration with Vector Search, each\r\n",
    "differentiated by the underlying Document Store they utilize.\n",
    "\n",
    "- ` VectorSearchVectorStor` \r\n",
    "uses Google Cloud Stora\n",
    "- `e VectorSearchVectorStoreDatasto` uses DataStorere\r",
    "re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_vertexai import (\n",
    "    VectorSearchVectorStore, # GCS Document Store\n",
    "    VectorSearchVectorStoreDatastore, # DataStore Document store\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can construct a `VectorSearchVectorStore` instance using the \n",
    "following snippet\r"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = VectorSearchVectorStoreDatastore.from_components(\n",
    "    project_id=\"my-project-id\",\n",
    "    region=\"my-region\",\n",
    "    index_id=\"my-index-name\",\n",
    "    endpoint_id=\"my-endpoint-name\",\n",
    "    embedding=embedding_model,\n",
    "    stream_update=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Regardless of your chosen storage backend (Datastore or Google Cloud Storage), the \n",
    "methods for interacting with the vector store remain consistent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "texts = [\n",
    "    \"This is the first document\",\n",
    "    \"This is the second document\",\n",
    "    \"This is the third document\"\n",
    "]\n",
    "vector_store.add_texts(texts=texts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Optionally, if you anticipate using filtering later, you can enrich your documents with\r\n",
    "metadata during the addition process. This metadata will be stored both within Vector\r\n",
    "Search for efficient retrieval and in your chosen Document Store (either Datastore or\r\n",
    "Google Cloud Storage) for further processing or analysis.\r"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "texts = [\n",
    "    \"This is the first document\",\n",
    "    \"This is the second document\",\n",
    "    \"This is the third document\"\n",
    "]\n",
    "\n",
    "metadatas = [\n",
    "{\"page_number\": 1, \"length\": 10},\n",
    "{\"page_number\": 2, \"length\": 20},\n",
    "{\"page_number\": 3, \"length\": 5}\n",
    "]\n",
    "\n",
    "vector_store.add_texts(texts=texts, metadatas=metadatas)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the `VectorStore` interface to perform similarity searches:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = vector_store.similarity_search(\"first\", k=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code snippet shows how to perform a similarity search\n",
    "using both numerical and string comparison filters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint import (\n",
    "    Namespace,\n",
    "    NumericNamespace,\n",
    ")\n",
    "\n",
    "filters = [Namespace(name=\"season\", allow_tokens=[\"spring\"])]\n",
    "numeric_filters = [NumericNamespace(name=\"price\", value_float=40.0, op=\"LESS\")]\n",
    "\n",
    "# Below code should return 2 results now\n",
    "vector_store.similarity_search(\n",
    "    \"shirt\", k=5, filter=filters, numeric_filter=numeric_filters\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CloudSQL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To leverage the LangChain integration with Cloud SQL, you'll need to install an\r\n",
    "additional library alongside` langchain-google-ertexai`p:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --upgrade langchain-google-cloud-sql-pg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The initial step involves constructing the engine object. \n",
    "\n",
    "This object defines the Google\n",
    "Cloud project, location, database, and instance that the `VectorStore` will interact with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_cloud_sql_pg import PostgresEngine\n",
    "\n",
    "engine = PostgresEngine.from_instance(\n",
    "    project_id=\"my-project-id\", \n",
    "    region=\"my-region\", \n",
    "    instance=\"my-instance-name\", \n",
    "    database=\"my-database-name\"\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Making use of the engine object we just created, we can use the\n",
    "`init_vectorstore_table` method to create the necessary table in the\n",
    "database if it doesn't already exist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_cloud_sql_pg import Column\n",
    "\n",
    "engine.init_vectorstore_table(\n",
    "    table_name=\"my-table-name\",\n",
    "    vector_size=768,      \n",
    "    metadata_columns=[Column(\"PRICE\", \"FLOAT\")],\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the engine and the table are in place, you can proceed to initialize the\n",
    "`VectorStore`. \n",
    "\n",
    "As with other integrations, you will also need to build an embedding\n",
    "model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_cloud_sql_pg import PostgresVectorStore\n",
    "from langchain_google_vertexai import VertexAIEmbeddings\n",
    "\n",
    "\n",
    "embedding = VertexAIEmbeddings(\n",
    "  model_name=\"textembedding-gecko@latest\", \n",
    "  project=\"my-project-id\"\n",
    ")\n",
    "\n",
    "store = PostgresVectorStore(\n",
    "    engine=engine,\n",
    "    table_name=\"my-table-name\",\n",
    "    embedding_service=embedding,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can add documents using the `add_documents` or `\n",
    "add_text`s methods.\n",
    "\n",
    " If you've created metadata columns, make sure to include thei \r\n",
    "values when adding documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import uuid\n",
    "\n",
    "all_texts = [\"Blue T-shirt\", \"Spring dress\", \"Black sunglasses\"]\n",
    "metadatas = [{\"PRICE\": 21.0}, {\"PRICE\": 23.0}, {\"PRICE\": 33.1}]\n",
    "ids = [str(uuid.uuid4()) for _ in all_texts]\n",
    "\n",
    "store.add_texts(all_texts, metadatas=metadatas, ids=ids)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To execute a similarity search, you can utilize the `similarity_search` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"I want glasses\"\n",
    "docs = store.similarity_search(query, k=2, filter=\"PRICE <= 10\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a final note, to optimize query performance, this `VectorStore` also provides a method\n",
    "for creating and applying a vector index to the table, as illustrated below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_cloud_sql_pg.indexes import IVFFlatIndex\n",
    "\n",
    "idx = IVFFlatIndex()\n",
    "store.apply_vector_index(idx)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## BigQuery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT = \"jzaldivar-test-project\"\n",
    "LOCATION = \"europe-west1\"\n",
    "DATASET = \"lcbook_dataset\"\n",
    "TABLE_NAME = \"my-table-name\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we create a dataset to store the data, using `google.cloud` bigquery client."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud import bigquery\n",
    "\n",
    "client = bigquery.Client(project=PROJECT, location=LOCATION)\n",
    "client.create_dataset(dataset=DATASET, exists_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We must initialize a specialized `VectorStore` class to use\r\n",
    "VectorSearch in BigQuery. In this case, it is available through`\r\n",
    "langchaigoogle_n_communi`ty under the nam`BigQueryVectorStore`h\r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We must then initialize the specialized `VectorStore` class to use VectorSearch in Bigquery. \n",
    "\n",
    "It is available in the library `langchain_google_community`, under the name `BigQueryVectorStore`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_vertexai import VertexAIEmbeddings\n",
    "from langchain.vectorstores.utils import DistanceStrategy\n",
    "from langchain_google_community import BigQueryVectorStore\n",
    "\n",
    "embedding = VertexAIEmbeddings(\n",
    "  model_name=\"textembedding-gecko@latest\", \n",
    "  project=PROJECT\n",
    ")\n",
    "\n",
    "store = BigQueryVectorStore(\n",
    "    project_id=PROJECT,\n",
    "    dataset_name=DATASET,\n",
    "    table_name=TABLE_NAME,\n",
    "    location=LOCATION,\n",
    "    embedding=embedding,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The interface for adding texts to `BigQueryVectorStore` is consistent with other\n",
    "`VectorStore` subclasses. However, it's important to note that metadata, unlike in the\n",
    "CloudSQL integration, can only be stored within the designated `metadata_field` as\n",
    "a JSON object. This means you'll need to structure your metadata accordingly to\n",
    "leverage it for filtered searches within BigQuery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_texts = [\n",
    "    \"Blue T-Shirt\", \n",
    "    \"Spring Dress\", \n",
    "    \"Black sunglasses\", \n",
    "]\n",
    "metadatas = [{\"len\": len(t)} for t in all_texts]\n",
    "\n",
    "store.add_texts(all_texts, metadatas=metadatas)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While basic filtering is supported in this way, the `BigQueryVectorStore`\n",
    "integration does not yet allow full SQL statement filtering. More complex filtering\n",
    "operations may require the use of raw SQL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = store.similarity_search(\"I want a dress\", filter={\"len\": 12})\n",
    "print(docs)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
